Minimum Error Rate Training Semiring
نویسندگان
چکیده
Modern Statistical Machine Translation (SMT) systems make their decisions based on multiple information sources, which assess various aspects of the match between a source sentence and its possible translation(s). Tuning a SMT system consists in finding the right balance between these sources so as to produce the best possible output, and is usually achieved through Minimum Error Rate Training (MERT) (Och, 2003). In this paper, we recast the operations implied in MERT in the terms of operations over a specific semiring, which, in particular, enables us to derive a simple and generic implementation of MERT over word lattices.
منابع مشابه
Minimum Error Rate Training and the Convex Hull Semiring
We describe the line search used in the minimum error rate training algorithm (Och, 2003) as the “inside score” of a weighted proof forest under a semiring defined in terms of wellunderstood operations from computational geometry. This conception leads to a straightforward complexity analysis of the dynamic programming MERT algorithms of Macherey et al. (2008) and Kumar et al. (2009) and practi...
متن کاملOptimizing Expected Word Error Rate via Sampling for Speech Recognition
State-level minimum Bayes risk (sMBR) training has become the de facto standard for sequence-level training of speech recognition acoustic models. It has an elegant formulation using the expectation semiring, and gives large improvements in word error rate (WER) over models trained solely using crossentropy (CE) or connectionist temporal classification (CTC). sMBR training optimizes the expecte...
متن کاملLattice-Based Minimum Error Rate Training Using Weighted Finite-State Transducers with Tropical Polynomial Weights
Minimum Error Rate Training (MERT) is a method for training the parameters of a loglinear model. One advantage of this method of training is that it can use the large number of hypotheses encoded in a translation lattice as training data. We demonstrate that the MERT line optimisation can be modelled as computing the shortest distance in a weighted finite-state transducer using a tropical polyn...
متن کاملFirst- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests
Many statistical translation models can be regarded as weighted logical deduction. Under this paradigm, we use weights from the expectation semiring (Eisner, 2002), to compute first-order statistics (e.g., the expected hypothesis length or feature counts) over packed forests of translations (lattices or hypergraphs). We then introduce a novel second-order expectation semiring, which computes se...
متن کاملImproved performance and generalization of minimum classification error training for continuous speech recognition
Discriminative training of hidden Markov models (HMMs) using segmental minimum classi cation error (MCE) training has been shown to work extremely well for certain speech recognition applications. It is, however, somewhat prone to overspecialization. This study investigates various techniques which improve performance and generalization of the MCE algorithm. Improvements of up to 7% in relative...
متن کامل